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Abstract —The increasingly wide application of Cloud Comput¬ 
ing enables the consolidation of tens of thousands of applications 
in shared infrastructures. Thus, meeting the QoS requirements 
of so many diverse applications in such shared resource en¬ 
vironments has become a real challenge, especially since the 
characteristics and workload of applications differ widely and 
may change over time. This paper presents an experimental 
system that can exploit a variety of online QoS aware adaptive 
task allocation schemes, and three such schemes are designed 
and compared. These are a measurement driven algorithm that 
uses reinforcement learning, secondly a “sensible” allocation 
algorithm that assigns jobs to sub-systems that are observed 
to provide a lower response time, and then an algorithm that 
splits the job arrival stream into sub-streams at rates computed 
from the hosts’ processing capabilities. All of these schemes 
are compared via measurements among themselves and with a 
simple round-robin scheduler, on two experimental test-beds with 
homogenous and heterogenous hosts having different processing 
capacities. 

Index Terms —Cognitive Packet Network, Random Neural 
Network, Reinforcement Learning, Sensible Decision Algorithm, 
Task allocation, Cloud Computing, Job Scheduling, Round Robin 


I. Introduction 

Cloud computing enables elasticity and scalability of com¬ 
puting resources such as networks, servers, storage, applica¬ 
tions, and services, which constitute a shared pool, providing 
on-demand services at the level of infrastructure, platform 
and software m. This makes it realistic to deliver computing 
services in a manner similar to utilities such as water and 
electricity where service providers take the responsibility of 
constructing IT infrastructure and end-users make use of the 
services through the Internet in a pay-as-you-go manner. This 
convenient and cost-effective way of access to services boosts 
the application of cloud computing, which spans many do¬ 
mains including scientific, health care, government, banking, 
social networks, and commerce G). 

An increasing number of applications from the general 
public or enterprise users are running in the Cloud, generating 
a diverse set of workloads in terms of resource demands, per¬ 
formance requirements and task execution 0- For example, 
multi-tier web applications composed of several components 
which are commonly deployed on different nodes 0, impose 
varied stress on the respective node, and create interactions 
across components. Energy consumption remains a major 
issue 0 that can be mitigated through judicious energy- 
aware scheduling 0, 0. Jobs being executed in a Cloud 


environment may be of very different types, such as Web 
requests that usually demand fast response and produce loads 
that vary significantly over time 0. On the other hand, 
scientific applications are computation intensive, though dur¬ 
ing execution they may undergo several phases with varied 
workload profiles 0, 0. MapReduce jobs are composed of 
different tasks of various sizes and resource requirements |8l . 
Furthermore, the nature of cloud computing which enables 
highly heterogeneous workloads to be served on top of a 
shared IT infrastructure leads to inevitable interference be¬ 
tween co-located workloads GDI. On the other hand, end users 
not only rely on the computation resources provisioned by the 
cloud, but also require assurance of the quality and reliability 
of the execution of the jobs that they submit. Therefore, 
the cloud service provider must also dispatch incoming jobs 
to servers with consideration for the quality of service and 
cost that it offers within a diverse and complex workload 
environment. 

A. Prior Work 

Extensive research on this challenging problem has pro¬ 
posed several job scheduling approaches. Static algorithms 
m,m,m are simple without excessive overhead, but they 
are only suitable for stable environments, and cannot adapt to 
changes in a Cloud. Dynamic algorithms oi, on, m, 03 
take into consideration different application characteristics and 
workload profiles both prior to, and during, run-time. They 
may be quite complex for heterogeneous environments and 
adapt to dynamic environments, but the resulting computation 
overhead may cause performance degradation when imple¬ 
mented in a real system. Thus, many of them only evaluated 
through simulations m rather than in practical experiments, 
while some have been tested in a real computer environment 
with low job arrival rates 0- 

Much work on task assignment in the Cloud is based 
on a detailed representation of tasks to be executed, but a 
rather simplistic representation of the hosts or processing 
sub-systems leading to an evaluation based on simulation 
experiments rather than measurements on a real system. In 
in an application composed of many tasks is represented by 
a directed acyclic graph (DAG) where tasks, intertask depen¬ 
dency, computation cost, and intertask communication cost are 
represented; two performance-effective and low-complexity 
algorithms rank the tasks to assign them to a processor in 
a heterogeneous environment. Related work is presented in 
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ED, ED, while optimization algorithms based on genetic 
algorithms [22], ant colony optimization (ACO) 11231 . Particle 
Swarm Optimization |24l . Random Neural Network optimiza¬ 
tion ESI, l26j , and auction-based mechanisms E3 have also 
been studied in this context, with potential applications to 
workload scheduling in the Cloud lf28l . In l29l . workload 
models which reflect the diversity of users and tasks in a Cloud 
production environment are obtained from a large number of 
tasks and users over a one month period, and exploited for 
evaluation in a simulated CloudSim framework. 

Other work has used experiments on real test-beds rather 
than simulations, as in HI where the characteristics of the typ¬ 
ical heterogeneous workloads: parallel batch jobs, web servers, 
search engines, and MapReduce jobs, results in resource 
provisioning in a manner that reduces costs for the Cloud itself. 
Another cost-effective resource provisioning system dedicated 
to MapReduce jobs ED uses global resource optimization. 
Hardware platform heterogeneity and co-scheduled workload 
interference are highlighted in 0, where robust analytical 
methods and collaborative filtering techniques are use to 
classify incoming workloads in terms of heterogeneity and 
interference before being greedily scheduled in a manner 
that achieves interference minimization and server utilization 
maximization. The system is evaluated with a wide range of 
workload scenarios on both a small scale computer cluster 
and a large-scale cloud environment applying Amazon EC2 to 
show its scalability and low computation overhead. However, 
the arrival rate of incoming workload is low and thus the 
system performance under saturation state is not examined. 
Furthermore, the contention for processor cache, memory 
controller and memory bus incurred by collocated workloads 
are studied in ED- 

Early research that consider the important role of servers in 
delivering QoS in the Internet can be found in t32l . where an 
architecture is proposed which provides web request classifi¬ 
cation, admission control, and scheduling with several priority 
policies to support distinct QoS requirements for different 
classes of users for multi-tier web applications. However, the 
scheduling approach is static and in 0, an adaptive feed-back 
driven resource control system is developed to dynamically 
provision resource sharing for multi-tier applications in order 
to achieve both high resource utilization and application-level 
QoS. A two-tiered on-demand resource allocation mechanism 
is presented in l33l with local allocation within a server and 
global allocation based on each local one, so as to achieve 
better resource utilization and dynamically adjust according to 
time-varying capacity demands. Energy consumption in com¬ 
putation, data storage and communications is also a challenge 
in the cloud 0- A model for server performance and power 
consumption is derived in |[34l with the potential to predict 
power usage in terms of workload intensity. In 1551 . 171 , 
the authors examine the selection of system load that provide 
the best trade-off between energy consumption and QoS. A 
heterogeneity-aware dynamic capacity provisioning scheme 
for cloud data centers is proposed in l36l . which classifies 
workloads based on the heterogeneity of both workload and 
machine hardware and dynamically adjusts the number of 
machines so as to optimize overall energy consumption and 


scheduling delay. 


B. Overview of our Approach 

The present paper uses experiments to investigate adaptive 
dynamic allocation algorithms that take decisions based on up- 
to-date measurements, and make fast online decisions to at¬ 
tempt achieve desirable QoS levels l37l . The software that we 
have designed to this effect is a practical system implemented 
as a Linux kernel module which can be easily installed and 
loaded on any PC with the Linux OS. Its design is inspired by 
Cognitive Packet Network l38l which is a QoS-driven adaptive 
routing protocol that select paths in a network to provide the 
best possible QoS for the network’s traffic based on online 
measurement. 

In the approach we propose, we embed measurement agents 
into each host in a cloud to observe the state system. These 
observations are then collected by “smart packets” (SPs) that 
are sent at regular intervals into the system in a manner which 
favours the search of those sub-systems which are of the 
greatest interest because they may be used more frequently or 
because they could provide better performance. The SPs then 
come back to the controller which uses a dynamic algorithm 
based either on the Random Neural Network (RNN) J38}, (39), 
or on a form of online greedy adaptation called “sensible 
routing’ ’ ED that selects probabilistically the host whose 
measured QoS is the best. We also study a task allocation 
scheme that splits the incoming jobs arrival into streams 
towards the different hosts, at fixed arrival rates chosen so as to 
take the best advantage of the hosts’ relative processing ability. 
We have conducted experiments with the RNN-based scheme, 
the sensible routing scheme, and the fixed rate scheme under 
varied job arrival rate via experiments on a real computer clus¬ 
ter test-bed, and compared them with other static algorithms 
such as Round Robin and an allocation scheme that distributes 
the jobs equally between hosts. To further our investigation, 
we have set up two different cluster test-beds: one composed 
of hosts with relative uniform processing ability, and the other 
one with an increasing processing capacity difference between 
hosts. The resulting experimental results are carefully analyzed 
and reported. 

The remainder of the paper is organized as follows. We 
detail the novel task allocation platform that we propose 
in Secti on |II| where the dynamic algorithms are introduced. 
Section II-A2 proposes a mathematical model for our system, 
which leads to the design of a fixed arrival rate based allocation 
scheme. Experimental results are presented in Section III 
to compare the performance obtained with all of the above 
allocation schemes. Section m draws our main conclusions 
and discusses directions for future research. 


II. Task Allocation Platform and Test-Bed 

In this section we propose a task allocation platform (TAP) 
where online monitoring and measurement are constantly car¬ 
ried out in order to keep track of the state of the cloud system, 
including the current resource utilisation (CPU, memory, and 
I/O), the system load, the application-level QoS requirements, 
such as job response time and bandwidth, as well as energy 
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consumption, and possibly also (in future versions of the 
system) system security and economic cost. With knowledge 
learned from these observations, the system employs the QoS 
driven task allocation algorithms that we have designed, to 
make online decisions that can achieve the best possible QoS 
as specified by the tasks’ owners, while adapting to varying 
conditions over time. 

Figure |T] shows the building blocks of our platform. The 
controller, which is the intellectual center of the system, ac¬ 
commodates the online task allocation algorithms, which usu¬ 
ally work along with a learning algorithm, with the potential 
to adaptively optimize the use of the cloud infrastructure. Our 
platform penetrates into the cloud infrastructure by deploying 
measurement agents: these agents conduct online observations 
that are relevant to the QoS requirements of end users, and 
send back the measurements to the controller. Using ideas 
from “Cognitive Packet Network” routing in packet networks 
ED, E2, three types of packets are used for communications 
between the components within the platform: smart packets 
(SPs) for discovery and measurement, dumb packets (DPs) for 
carrying job requests or jobs, and acknowledgement packets 
(ACKs) that carry back the information that has been discov¬ 
ered by SPs and experienced by allocated jobs. In this section, 
we present in detail the mechanisms that are implemented in 
the platform and the algorithms that are used. 



Fig. 1. System Architecture of the Task Allocation Platform (TAP) 


SPs are first sent at random to the various hosts in order to 
obtain some initial information and inform the measurement 
agents in the hosts to activate the requested measurement. The 
task allocation algorithm in TAP learns from the information 
carried back by the ACKs and makes adaptively optimized 
decisions which are used to direct the subsequent SPs. Thus, 
the SPs collect online measurements in an efficient manner 
and pay more attention to the part of the cloud where better 
QoS can be offered, visiting the worse performing parts less 
frequently. 

The incoming jobs or job requests are encapsulated into 
the DPs, and exploit the decisions explored by SPs to select 
the host/cloud sub-system that will to execute the job. Once 
a job (request) arrives at a host in the cloud, its monitoring 
is started by the measurement agent which records the trace 
of the job execution until it is completed and deposits the 
records into a mailbox which is located in the kernel memory 
of the host. When a SP arrives at this host, it collects the 
measurements in the mailbox and generates an ACK which 
carries the measurements, and travels back to the controller 
where the measurement data is extracted and used for sub¬ 
sequent decisions of the task allocation algorithm. As soon 
as a job completes its execution, the agent also produces an 
ACK heading back to the controller with all the recorded data, 
such as the job arrival time at the cloud, the time at which the 
job started running and the time at which the job execution 
completed. When the ACK of the DP reaches the controller, 
the job response time at the controller is estimated by taking 
the difference between the current arrival time at the node and 
the time at which the corresponding job arrives at the controller 
which is used by the algorithm when the job response time is 
required to be minimized. 

TAP may use different schemes to make decisions regarding 
task allocation, and in the sequel we will describe two ran¬ 
domized schemes in Section |H-A[ as well as a scheme based 
on Reinforcement Learning l43l which uses the random neural 
network model |II-B as the adaptive critic for the goal or cost 
function to be minimized. 


We conduct our experiments on a real a test-bed cluster 
composed of four nodes. One node is dedicated to the TAP, 
and the other three nodes are used as hosts running jobs, as 
shown in Figure [5] with each having a different processing 
power so that we may observe significant execution time 
differences for a given job running in each of the clusters. 
TAP takes decisions based on online measurements which 
collected by SPs. Even when there are no incoming jobs, 
the system maintains awareness of the state of the cloud by 
sending SPs periodically. End users are allowed to declare 
the QoS requirements related to the jobs they are planning to 
submit, which is then translated into one or more QoS metrics 
which constitute a function called the “goal function” in our 
system. In this way, the QoS requirements are transformed 
into a goal function to be minimized, e.g. the minimization 
of the job response time. The goal function determines which 
system parameters need to be measured and how optimized 
task allocation is to be carried out. 


A. TAP’s Randomized Task Allocation Schemes 

By a randomized task allocation scheme for TAP, we mean 
that when a job arrives at TAP from some user or source 
outside the Cloud system, TAP decides to allocate it to some 
host i among the N available hosts, with probability p, so that 
at decision time when the task must be allocated: 

• TAP first calculates pi for each of the hosts i, 

• Then TAP uses these probabilities to actually select the 
host that will receive the task. 

Randomized schemes have the advantage that a host which 
is being preferred because it is providing better service is not 
being overloaded by repeated allocation since the QoS it offers 
is only used probabilistically to make a task allocation. 

To this effect, TAP uses two distinct schemes to calculate 
Pi, Sensible Routing, and Model Based Allocation. 

1) Sensible Routing: The sensible decision algorithm had 
been proposed in |[4(JI as an adaptive routing algorithm which 
applies randomized routing policies based on the expected 
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QoS so as to improve QoS. We use the sensible decision 
algorithm in our task allocation system where allocation 
policies are described as probabilistic choices among all avail¬ 
able hosts. QoS metrics are defined as non-negative random 
variables. A QoS metric is viewed as sensitive when the value 
for the QoS metric corresponding to a host increases as the 
probability of dispatching jobs to that host increases. Job 
response time and job execution time are examples of sensitive 
metrics. 

In the Sensible Routing approach that we propose for TAR 
we use a weighted average Gi of the goal function that we 
wish to minimize. Gi is estimated for for each of the hosts i, 
and updated each time t, that TAP receives a measurement that 
can be used to update the goal function. If the measured total 
job response time G\ is received at the TAP regarding host 
i, then the following the expression is used to update TAP’s 
estimate of G, : 

G™ = {1 — a)G" -1 + aG\, (1) 

where the parameter 0 < a < 1 is used to vary the weight 
given to the most recent measurement as compared to past 
values, and n denotes the value of the goal function obtained 
after the n-th update. The probability p\ that may then be used 
to allocate a task to a host will be: 

l 

K = (2) 

2^=1 G f 

Of course, when TAP needs to allocate a task, it will use the 
most recent value of pf which is available. 

2) Model Based Task Allocation: Model Based Allocation 
uses a mathematical model to predict the estimated perfor¬ 
mance at a host in order to make a randomized task allocation. 
This has been used in earlier work concerning task allocation 
schemes that help reduce the overall energy consumed in a 
system (7). In this approach, if IT,(A, p,) is the relevant QoS 
metric obtained for host i by allocating a randomized fraction 
Pi of jobs to host i when the overall arrival rate of jobs to TAP 
is A, then the allocation probabilities pi, ... ,p^ are chosen 
so as to minimize the overall average QoS metric: 

N 

w = Y J ViW i {X,p i ). (3) 

i -1 

At first glance, since each host i is a multiple-core machine 
with Ci cores, a simple mathematical model that can be used 
to compute, say the QoS metric “response time” IT,(A, p,) 
that host i provides, assuming that there are no main memory 
limitations and no interference among processors (for instance 
for memory or disk access), is the M/M /C, queueing model 
gM, i-e. with Poisson arrivals, exponential service times, 
and Ci servers. Of course, both the Poisson arrival and the 
exponential service time assumptions are simplifications of 
reality, and more detailed and precise models are also possible 
for instance using diffusion approximations El but would 
require greater computational effort and more measurement 
data. 

However, a set of simple experiments we have conducted 
show that the M/M/K model for each host would not 


correspond to reality. Indeed, in Figure [2] we report the 
measured completion rate of jobs on a host (y-axis) relative 
to the execution time for a single job running by itself, as 
a function of the number of simultaneously running jobs (x- 
axis). These measurements were conducted on a single host 
(Host 1), and for a single job running on the system, the 
average job processing time was 64.1ms. 

If this were a perfectly running ideal parallel processing 
system, we could observe something close to a linear increase 
in the completion rate of jobs (red dots) when the number 
of simultaneously running jobs increases, until the number 
of cores in the machine Cj have been reached. However the 
measurements shown in Figure|2] indicate (blue dots) a signif¬ 
icant increase in completion rate as the number of jobs goes 
from 1 to 2 , but then the rate remains constant, which reveals 
that there may be significant interference between jobs due to 
competition for resources. Indeed, if we call 7 (/) the average 
completion rate per job, we observed the following values 
for 7i(()/7i(l) for l = 2 , ... ,10 computed to two decimal 
digits: 0.67,0.48,0.34,0.29,0.23,0.20,0.17,0.15,0.13. From 
this data, a linear regression estimate was then computed for 
the average execution time /r(i) _1 when there are l jobs 
running simultaneously, as shown on Figure [3] yielding a 
quasi-linear increase. As a result we can quite accurately use 
the estimate Z- 7 (Z)/ 7 ( 1 ) ~ 1.386. Based on this measured 



Fig. 2. The idea service rate provided by the perfect multiple core system 
(red), compared to the measured job completion rate on Host 1 (blue), plotted 
against the number of jobs running simultaneously on the host (x-axis). 



Fig. 3. Measurement of the effective job execution time per job on Host 1, 
versus the number of simultaneously running jobs on the host (x-axis). 

data, we model the distribution of the number of jobs in a 
host server i as a random walk on the non-negative integers, 
where: 
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• 1 = 0 represents the empty host (i.e. with zero jobs at the 
host), 

• The transition rate from any state l > 0 to state l + 1 is 
the arrival rate of jobs to the host A i, 

• The transition rate from state 1 to state 0 is the /Xj( 1) = 
7” 1 where 7) is the average execution time of a job (by 
itself) on the host, 

• The transition rate from state l + 1 to state l if l > 1 is 
quasi constant given by pw = (l.'y(l)/'y(l))ij, i (l), 

• The arrival rate of jobs to Host z is = p™ A where p™ 

is the probability with which TAP using the model based 
algorithm assigns jobs to Host i, and A is the overall 
arrival rate of jobs to TAP. 

The probability that there are l jobs at Host i in steady-state 
is then: 


Pii 1 ) =Pi{ °)—TAT’ 

/Ml) 

Pi(l) = { — Y~ 1 Pi( 1), l > 1, 

PiO 


Pi{ 0) 


1 - 


A; 

WO 


1 + Aj 


AtiO-Ati(l) ' 


and the resulting average response time for jobs arriving to 
Host i, which by Little’s formula 1441 is equal to the average 
number of the jobs divided by the arrival rate at Host i, 
becomes: 


wr = 


Pi( 0) 


md a -hr 


(4) 


and the overall average response times that we wish to 
minimize, by chosing the p™ for a given A is: 


W m = 


Pi 


Pi{ 0) 


N 


(5) 


The appropriate values of the p™ f° r a given system and a 
given arrival rate A can be then obtained numerically. 

To illustrate this approach for the specific service time 
data regarding the three hosts that we use in our test-bed, 
in Figure [4] we show the variation of the average job re¬ 
sponse time with different combinations of [Ai, A 2 , A 3 ], when 
A = 20 jobs jsec. 


B. Random Neural Network-based Allocation 

The Random Neural Network (RNN) has been used in 
static task allocation problems Il26i . as well as for dynamic 
allocation of traffic to routes in packet networks 1421 . The 
RNN comprised of N neurons is often used in a “recur¬ 
rent” or fully connected form (46), where each neuron i is 
characterized by an integer ki(r) > 0 where r represents 
time, and each neuron is connected to other neurons by both 
excitatory and inhibitory weights. Furthermore, for the specific 
application that we are considering in the TAP, each neuron 
is identified with a particular host, i.e. neuron i is identified 
with the decision to assign a task to Host i. The theoretical 
underpinning of the RNN iii, m is based on a theorem 
that states that, at the equilibrium state, the probabilities: 

qi = lim Prob[ki(r) > 0], ( 6 ) 



lambda2 0 D lambdal 


Fig. 4. Variation of the overall average job response time predicted by 
the infinite server model, with different combinations of [Ai, A 2 , A 3 ], when 
A = Ai + A 2 -b A 3 is set to 20 jobs per second. 


are obtained from the expression: 

= A(i) + Ef =1 ^ + (J',») 

r{i) + A (i) + E^lr q 0 w~ (j, i ) ’ 

where the w + (j,i ) and w~(j,i ) are the excitatory and in¬ 
hibitory weights from neuron j to neuron i, and A(i) and 
A (?) are the external flows or inputs of external excitatory and 
inhibitory signals to neuron i, while r(i) is the firing or total 
activity rate of neuron i: 

N 

='^2[' w+ (i,j)+w~{i,j)] (8) 

2=1 

In the present case, we assume that a distinct RNN is set up 
within the TAP to cover each distinct goal function. However, 
these different RNNs would not have to be created in advance 
and stored at the TAP indefinitely, but instead created when 
they are actually needed. Thus we imagine that we may have 
a different RNN that is used to decide about allocations that 
involve minimizing the economic cost of a task allocation (as 
when the end users are expected to pay a monetary price 
for the work they receive), or a different one that deals with 
minimizing response time, and so on. 

Suppose that the goal function to be minimized is denoted 
by G, such as the response time to incoming jobs or the 
execution time of jobs. Before collecting any measurements 
in the system we initialize the decision system with a pa¬ 
rameter set q, = 0.5 for all of the i, obtained by setting 
= l/2(iV — 1), and set all “self” 
excitation and inhibition rates (from the neuron to itself) to 
zero. Thus r(i) = 1 for all i, and A (i) = 0.25 + 0.5A(i). In 
particular we can choose A(z) = 0 so that all A(i) = 0.25. 

TAP will then use the q^, i = 1, ... , N to make allocations 
so that the task is assigned to the host with the highest value 
of q^ and in the initial value chosen any one of the hosts will 
be chosen with equal probability. However with successive 
updates of the weights, this will change so that TAP will select 
the “better” hosts which provide a smaller value of G. 
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Thus when TAP receives an observation or measurement G\ 
with regard to the goal function that one wishes to minimize, 
the RNN weights are updated as follows: 

• We first update a decision threshold 7} as 

T, = q T m + (1 - a)G\ (9) 


where 0 < a < 1 is a parameter used to vary the relative 
importance of “past history”. 

• Then, if G\ < Ti then it is considered that the advice 
provided by the RNN in the past was successful and TAP 
updates the weights as follows: 


w + {j,i) <- w + (j,i) + G\ 

w~{j,k) <- w~(j, k) + G\/(n — 2 ), if k ^ i 

• else if G\ > Ti 

w + {j,k) <r- w + (j, k) + G\/(n — 2 ), if k ^i 

w~(j,i) <- w~(i,j) + G\ 


After the weights are updated, the c/j are computed again with 
new weights. We note that this algorithm will tend to increase 
the probability q t of those neurons which correspond to hosts 
that yield a better value of Gj, which is why each time TAP 
assigns a task to a host, it uses the host i that corresponds to 
the largest q,. 

In order to make sure that TAP tries out other alternates and 
does not miss out on better options, 10 % of the decisions are 
made at random: thus on average one out of ten decisions are 
based on a random (equally likely) choice among all hosts, 
while 90% of the decisions are based on the optimization 
algorithm that we have described. 

Note also that this algorithm can be modified to a “sensible” 
version where: 


RNN—S 


<h 

N 


GO) 


III. Experiments with the Task Allocation 
Platform 

Our proposed platform TAP is a practical system which can 
exploit several different task allocation algorithms, such as the 
three that have been described above, and it is implemented 
as a Linux kernel module which can be easily installed and 
loaded on any PC with Linux OS. We have implemented TAP 
for a cluster of three hosts for job execution and a separate 
host working as the controller, to which the three hosts are 
connected, as shown in Ligure [5] 

Lor the purpose of our experiments, a synthetic benchmark 
is generated, with job profiles indicated by using the fields 
{Job ID,QoS requirement, Job Size}, which are then 
packetized into an IP packet and sent to the controller. The job 
request generator can be configured to send job requests either 
at a fixed inter-job interval, denoted “constant rate” or CR, or 
following a Poisson process with independent and identically 
distributed inter-job arrival intervals with a given rate (denoted 
EXP). The controller where TAP is running, decides on job 
placement based on the measurements carried back by ACKs 
and deposited in the mailbox. 


The experiments we report in this paper were run with jobs 
that were defined as a “prime number generator with an upper 
bound B on the prime number being generated”. Thus the 
choice of B allowed us to vary both the execution time and the 
memory requirements of the job. We did not actually “transfer” 
the jobs from the task controller to the host, but rather installed 
the job in advance on the host, and the allocation decision by 
TAP just resulted in arrival of a message from TAP to activate 
the job with specific value of B on that particular host. The 
measurement agent resident on that host then monitored the 
job execution and recorded its measurements into the mailbox. 
Both the jobs and the measurement agent run in the user’s 
memory space, while the module that receives the SPs and 
job requests carried by DPs, collects measurements from the 
mailbox, and generates ACKs with the collected measurements 
runs in the kernel space of memory as shown in Ligure [5] so 
that interference between the user program and the system 
aspects are avoided at least within the memory. 

We set up the six experimental scenarios listed in Table [I] 
The two QoS goals that were considered were (a) the mini¬ 
mization of either the execution time on the host, and (b) the 
minimization of the response time at TAP, which includes the 
message sent to activate the job at a host and the time it takes 
for an ACK to provide information back to the TAP, i.e. job 
execution time and job response time at the controller. 

We first used TAP with the RNN algorithm with Reinforce¬ 
ment Learning (RL) as described above, and TAP with the 
sensible decision algorithm, and compared their performance. 

The RNN based TAP was experimented with both (a) and 
(b), whereas the sensible decision based TAP only used (b) the 
job response time at the controller. In addition, according to 
the analytical model based approach was with (b) job response 
time computed in terms of the job arrival rate and the system 
service rate, and then used to determine the optimum values 
of Ai, A 2 , A 3 corresponding to the three hosts subject to A = 
A | +A 2 + A 3 , with an aim to minimize the overall job response 
time of the system as in (|5j, and then conducted experiments 
with job allocation probabilities to the three hosts selected so 
as to result in the arrival streams to the three hosts having the 
rates recommended by the analytical solution. 

We also compared two static allocation schemes: Round 
Robin where successive jobs are sent to each host of the cluster 
in turn, and an equally probable allocation where a job is 
dispatched to each host with equal probability 0.33. 

All these experiments were repeated for a range of average 
job arrival rates A equal to 1, 2, 4, 8,12, 16, 20, 25, 30,40 jobs 
per second. Each experiment lasted 5 mins so as to achieve a 
stable state for each experiment. 


A. Comparison of the RNN and the Sensible Algorithm 

We compared the two approaches with regard to task allo¬ 
cation based on the average job response time at the controller, 
the average job response time at the host and the average job 
execution time. The three metrics exhibit the same trend as 
shown in Ligure [ 6 ] At low job arrival rates less than 8 /sec, 


the RNN with RL performs better as shown in Ligure 6 (d) 


and it is even clearer with constant job arrival rates. However, 




7 


Hosts 




Fig. 5. Task allocation testbed 


Notation 

Description 

RNNs (RT) with CR 

Random Neural Network algorithm 
with online measurement of the 
job response time at the controller 
and constant job arrival rates 

RNNs (RT) with EXP 

Random Neural Network algorithm 
with online measurement of the 
job response time at the controller 
and exponentially 
distributed job interarrival time 

RNNs (ET) with CR 

Random Neural Network algorithm 
with online measurement of the 
job execution time and constant 
job arrival rates 

RNNs (ET) with EXP 

Random Neural Network algorithm 
with online measurement of the 
job execution time and exponentially 
distributed job interarrival time 

Sensible Decision with CR 

Sensible Decision algorithm 
with online measurement of the 
job response time at the controller and 
constant job arrival rates 

Sensible Decision with EXP 

Sensible Decision algorithm 
with online measurement of the 
job response time at the controller and 
exponentially distributed job 
interarrival time 


TABLE I 

Experiment Scenarios 


as the average job arrival rates grows, the sensible decision 
algorithm outperforms the RNN, as in Figure |6(c)| Also the 
RNN algorithm with online measurement of the job execution 
time always performs better than the RNN with the metric of 
job response time. However, the sensible decision is always 


best under high job arrival rates, as shown in Figure 6(c) 


To explain these experimental results, we note that in these 
we use CPU intensive jobs, and each of them experience longer 
execution time than when they are executed separately due 
to the competition for the same physical resource, the CPU. 
Indeed, the hosts are multicore machines running Linux with a 
multitasking capability so that multiple jobs will run together 
and interfere with each other as shown in Figure [3] It can be 
found that, for example, if four jobs running in parallel, the 
average execution/response time per job increases two times. 
That is to say, the fluctuation of the execution time that the jobs 
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Fig. 6. The average job execution time, the average job response time at the 
hosts and the average job response time at the Controller in the six experiment 
scenarios versus the varied average job arrival rates 


experienced under varied number of jobs in the system is quite 
significant. Since the RNN with RL will send the jobs to the 
best performing hosts, it will tend to overload them, contrary to 
the Sensible Algorithm which dispatches jobs probabilistically 
and therefore tends to spread the load in a better manner. 

When RNN used the job execution time as the QoS crite- 
































































































































rion. Figure [7(a)| shows that it dispatched the majority of jobs 
correctly to Host 3 which provided the shortest service time. 
The other two hosts accommodated some jobs because the 
RNN algorithm was programmed to make 10% of its decisions 
at random with equal probability. Here, the sensible decision 
algorithm performed worse because it makes job allocation 
decision with a probability that is inversely proportional to the 
job response time/execution time, instead of exactly following 
the best QoS as the RNN. As shown in Figure 7(b) the 
proportion of the jobs allocated with the sensible decision 
algorithm coincides with the proportion of the respective 
speeds of the three hosts. 


Job Allocation with RNN algorithm under Varied Job Arrival Rates 



450 

”| 400 

E 350 
i— 

o 300 
| 250 

LU 

-§ 200 

§> 150 
a> 

100 

50 


- ►- RNN using response time 

- ■ - RNN using execution time - 

- • - Sensible allocation 
-~A~ Round Robin 

■ ♦ Equal Probability 

■ * “ Fixed Arrival Rate 





-■'3 





0 


10 20 30 

Average Job Arrival Rate(per second) 


(a) 


- ► - RNN using response time 

- ■ - RNN using execution time - 

- • - Sensible allocation 
~A~ Round Robin 

Equal Probability 

- * - Fixed Arrival Rate 


2 4 6 8 10 

Average Job Arrival Rate(per second) 


(a) 


Job Allocation with Sensible Decision algorithm under Varied Job Arrival Rates 



(b) 

Fig. 7. The Proportion of job allocations to the three hosts with the RNN 
and the sensible decision algorithm for different arrival rates and Poisson job 
arrivals to TAP 

On the other hand, the sensible decision algorithm benefits 
from the fact that it does not overload the “best” hosts as 
shown in Figure |6(c)| where the jobs may sometimes arrive 
to a host at rate that is higher than the average processing 
rate. In Figure [6] we also see that the RNN based algorithm, 
that uses the job execution time measured at the hosts as the 
QoS goal, outperforms the RNN with online measurement of 
the job response time, because the actual job execution can 
be a more accurate predictor of overall performance when the 
communication times between the hosts and the TAP fluctuate 
significantly. However at high job arrival rates, the sensible 
decision algorithm again performed better. 

B. Comparison with the Model Based and Static Allocation 
Schemes 

Figure[8]shows the average job execution time obtained with 
the RNN and the Sensible Algorithm, in comparison with the 


(b) 

Fig. 8. The average execution time experienced under varied job arrival 
rates and different task allocation schemes when the three hosts have similar 
performance. 


model based scheme, as well as the Round Robin and Equally 
Probable allocation. The model based scheme performed better 
than the RNN when the job arrival rate was low, and better 
than the Sensible Algorithm at high arrival rates. However, the 
model based scheme can be viewed as an “ideal benchmark” 
since it relies on full information: it assumes knowledge of the 
arrival rate, it supposes that arrivals are Poisson, and it assumes 
knowledge of the job service rates at each host, while the RNN 
based scheme just observes the most recent measurement of 
the goal function. 

As expected the equally probable allocation scheme per¬ 
formed worse. In this case where all servers are roughly equiv¬ 
alent in speed. Round Robin always outperformed the Sensible 
Algorithm, because it distributes work in a manner that does 
not overload any of the servers. These results are summarized 
in Figure 8(a) However the observed results change when 
the hosts have distinct performance characteristics as shown 
below. 


C. Performance Measurements when Hosts have Distinct Pro¬ 
cessing Rates 

As a last step, we evaluate the algorithms that we have 
considered, in a situation where each hosts provides signifi¬ 
cantly different performance. Since the hosts we have for our 
experiments are quite similar, we introduced a background 
load on each host which runs constantly and independently 
of the tasks that TAP allocates to the hosts. This is in fact a 
realistic situation since in a Cloud, multiple sources of tasks 
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Incoming Job Rate(per second) 


(a) 



Incoming Job Rate(per second) 


(b) 

Fig. 9. Average execution time experienced in a cluster composed of hosts 
with non-uniform processing capacities. 


may share the same set of hosts without knowing what their 
precise workload may be, except for external observations of 
their performance. 

Thus we were able to emulate as set of three Hosts 1, 2, 3 
with relative processing speeds of 2 : 4 : 1. The results 
of these experiments are summarized in Figure [9] We see 
that TAP with both the RNN and the Sensible Algorithm 
benefits from the ability of these two schemes to measure the 
performance differences between the hosts, and dispatch jobs 
to the hosts which offer a better performance, whereas the two 
static allocation schemes (Round Robin and the allocation of 
tasks with equal probabilities) lead to worse performance as a 
whole. 

The performance of the RNN-base scheme clearly stands 
out among the others as shown in Figure 9(b) confirming that 
a system such as TAP equipped with the RNN can provide a 
very useful fine-grained QoS-aware task allocation algorithm. 


IV. Conclusions and Future Work 

In this paper we have presented TAP, a task allocation plat¬ 
form which can incorporate a variety of different algorithms 
to dispatch jobs to hosts in the Cloud. TAP can exploit both 
simple static allocations schemes (such as the Round Robin), 
as well as measurement driven adaptive on-line algorithms 
such as the RNN and the Sensible Algorithms that bring 
intelligence to bear from observations and make judicious al¬ 
location decisions. We conducted numerous experiments with 
a CPU intensive workload to evaluate both static and adaptive 
allocation schemes in two different hosting environments: one 


composed of hosts with very similar processing speeds, and 
another one with hosts having different speeds due to distinct 
background loads at each host. 

Experiments showed that when the hosts are quite dis¬ 
tinct, the RNN based algorithm with Reinforcement-Learning 
offered a fine-grained QoS-aware task allocation algorithm 
which can make accurate decisions provided that the online 
measurements are regularly updated. We found that the Sen¬ 
sible Algorithm offers a robust QoS-aware scheme with the 
potential to perform better under loads. The fixed arrival rate 
scheme, with full information of arrival rates and service rates, 
outperformed both the RNN and “sensible” approach due to 
the fact that it employs the solution of an analytical model 
that allows one to minimize job response time under known 
mathematical assumptions which may actually not known or 
valid in practice: it is thus useful as a benchmark but cannot be 
recommended in practical situations. Round Robin is a simple 
algorithm, which is effective when the processing rates and 
loads at each of the hosts are very similar. 

In future work we will investigate the use of more sophisti¬ 
cated mathematical models such as diffusions approximations 
|45l|] to build a model driven allocation algorithm that exploits 
on-line measurements of the arrival and service statistics at 
each of the hosts in order to estimate the task allocation 
probabilities. Although we expect that such an approach will 
have its limits due to the increase of the amount of data that it 
will need, we also think that it may offer a better benchmark 
for the comparison of various allocation methods. We would 
also like to study the Cloud system we have described when 
a given set of hosts is used by multiple TAP systems with 
heterogenous input streams (such as Web services, mobile 
services and compute intensive applications) to see which 
schemes can offer the most robust and resilient allocation 
schemes in the presence of competing and diverse workloads. 
Another direction we wish to undertake is the study of the 
robustness of allocation schemes for Cloud services in the 
presence of attacks |49| designed to disrupt normal operations. 
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